A Bayesian Insertion/Deletion Algorithm for Distant Protein Motif Searching via Entropy Filtering

نویسندگان

  • Jun XIE
  • Ker-Chau LI
  • Minou BINA
  • Jun Xie
چکیده

Bayesian models have been developed that Ž nd ungapped motifs in multiple protein sequences. In this article, we extend the model to allow for deletions and insertions in motifs. Direct generalization of the ungapped algorithm, based on Gibbs sampling, proved unsuccessful because the conŽ guration space became much larger. To alleviate the convergence difŽ culty, a two-stage procedure is introduced. At the Ž rst stage, we develop a method called entropy Ž ltering, which quickly searchs “good” starting points for the alignment approach without the concern of deletion/insertion patterns. At the second stage, we switch to an algorithm that generates both a random vector that represents insertion/deletion patterns and a random variable of motif locations. After the two steps, gapped-motif alignments are obtained for multiple sequences. When applied to datasets that consist of helix–loop–helix proteins and high mobility group proteins, respectively, our methods show great improvements over those that produce ungapped alignments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Markovian Structures in Biological Sequence

SUMMARY The alignment of multiple homologous biopolymer sequences is crucial in research on protein modeling and engineering, molecular evolution, and prediction both as to gene function and gene product's structure. In this article, we provide a coherent view of the two recent models used for multiple sequence alignment | the hidden Markov model (HMM) and the block-based motif model | in order...

متن کامل

Determination of Maximum Bayesian Entropy Probability Distribution

In this paper, we consider the determination methods of maximum entropy multivariate distributions with given prior under the constraints, that the marginal distributions or the marginals and covariance matrix are prescribed. Next, some numerical solutions are considered for the cases of unavailable closed form of solutions. Finally, these methods are illustrated via some numerical examples.

متن کامل

Bayesian Phylogenetic Inference under a Statistical Insertion-Deletion Model

A central problem in computational biology is the inference of phylogeny given a set of DNA or protein sequences. Currently, this problem is tackled stepwise, with phylogenetic reconstruction dependent on an initial multiple sequence alignment step. However these two steps are fundamentally interdependent. Whether the main interest is in sequence alignment or phylogeny, a major goal of computat...

متن کامل

Identification of motifs with insertions and deletions in protein sequences using self-organizing neural networks

The problem of motif identification in protein sequences has been studied for many years in the literature. Current popular algorithms of motif identification in protein sequences face two difficulties, high computational cost and the possibility of insertions and deletions. In this paper, we provide a new strategy that solve the problem more efficiently. We develop a self-organizing neural net...

متن کامل

The study on the spam filtering technology based on Bayesian algorithm

This paper analyzed spam filtering technology, carried out a detailed study of Naive Bayes algorithm, and proposed the improved Naive Bayesian mail filtering technology. Improvement can be seen in text selection as well as feature extraction. The general Bayesian text classification algorithm mostly takes information gain and cross-entropy algorithm in feature selection. Through the principle o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004